arXiv: 1509.01190vl [cs.DS] 3 Sep 2015 


Advanced Multilevel Node Separator Algorithms 


Peter Sanders and Christian Schulz 
Karlsruhe Institute of Technology, Karlsruhe, Germany 

fSanders,Christian. schulzjSkit. edu 


Abstract. A node separator of a graph is a subset S of the nodes such that removing S and its inci¬ 
dent edges divides the graph into two disconnected components of about equal size. In this work, we 
introduce novel algorithms to hnd small node separators in large graphs. With focus on solution quality, 
we introduce novel flow-based local search algorithms which are integrated in a multilevel framework. 
In addition, we transfer techniques successfully used in the graph partitioning held. This includes the 
usage of edge ratings tailored to our problem to guide the graph coarsening algorithm as well as highly 
localized local search and iterated multilevel cycles to improve solution quality even further. Experi¬ 
ments indicate that how-based local search algorithms on its own in a multilevel framework are already 
highly competitive in terms of separator quality. Adding additional local search algorithms further 
improves solution quality. Our strongest conhguration almost always outperforms competing systems 
while on average computing 10% and 62% smaller separators than Metis and Scotch, respectively. 


1 Introduction 

Given a graph G = (fV, E), the node separator problem asks to find three disjoint subsets Vi, V2 and 
S of the node set, such that there are no edges between Vi and V2 and V = V 1 UV 2 US'. The objective 
is to minimize the size of the separator S or depending on the application the weight of its nodes 
while Vi and V2 are balanced. Note that removing the set S from the graph results in at least two 
connected components. There are many algorithms that rely on small node separators. For example, 
small balanced separators are a popular tool in divide-and-conquer strategies [231I2TT3] . are useful 
to speed up the computations of shortest paths mmM or are necessary in scientific computing to 
compute fill reducing orderings with nested dissection algorithms |15) . 

Finding a balanced node separator on general graphs is NP-hard even if the maximum node 
degree is three PE]. Hence, one relies on heuristic and approximation algorithms to find small node 
separators in general graphs. The most commonly used method to tackle the node separator problem 
on large graphs in practice is the multilevel approach. During a coarsening phase, a multilevel 
algorithm reduces the graph size by iteratively contracting nodes and edges until the graph is small 
enough to compute a node separator by some other algorithm. A node separator of the input graph 
is then constructed by successively transferring the solution to the next finer graph and applying 
local search algorithms to improve the current solution. 

Current solvers are typically more than fast enough for most applications (for example |2 113) 1 
but lack high solution quality. In this work, we address this problem and focus on solution quality. 
The remainder of the paper is organized as follows. We begin in Section by introducing basic 
concepts and by summarizing related work. Our main contributions are presented in Section]^ where 
we transfer techniques previously used for the graph partitioning problem to the node separator 
problem and introduce novel flow based local search algorithms for the problem that can be used 
in a multilevel framework. This includes edge ratings to guide a graph coarsening algorithm within 
a multilevel framework, highly localized local search to improve a node separator and iterated 
multilevel cycles to improve solution quality even further. Experiments in Section indicate that 
our algorithms are able to provide excellent node separators and outperform other state-of-the-art 








algorithms. Finally, we conclude with Section]^ All of our algorithms have been implemented in the 
open source graph partitioning package KaHIP (.'lOj and will be available within this framework. 

2 Preliminaries 
2.1 Basic concepts 

In the following we consider an undirected graph G = {V = {0,... ,n — G\,E) with n = |P|, and 
m = \E\. r(v) := {u : {v,u} G E} denotes the neighbors of a node v. A set C C P of a graph is 
called closed node set if there are no connections from C to V \C, i.e. for every node u € C an 
edge (u, v) € E implies that v € C as well. In other words, a subset (7 is a closed node set if there 
is no edge starting in C and ending in its complement P \ (7. A graph S = E') is said to be a 

subgraph of G = {V,E) ifV'EV and E' E E r\ iV' x V'). We call S an induced subgraph when 
E' = Ed {V X V'). For a set of nodes U CV, G[U] denotes the subgraph induced by U. We define 
multiple partitioning problems. The graph partitioning problem asks for blocks of nodes Vi,... ,Vk 
that partition V, i.e., lA U • • • U 14 = P and Vi DVj = 9 for i ^ j. A balancing constraint demands 
that Vi G {1..A:} : \Vi\ < Tmax := (1 + f)r|P|/^l for some parameter e. In this case, the objective 
is often to minimize the total cut \Eij\ where Eij := {{u, f} ^ E : u G Vi,v G Vj}. The set 

of cut edges is also called edge separator. A node v € Vi that has a neighbor w € Vj,i ^ j, is a 
boundary node. An abstract view of the partitioned graph is the so called quotient graph, where 
nodes represent blocks and edges are induced by connectivity between blocks. The node separator 
problem asks to find blocks, 14,14 and a separator S that partition V such that there are no edges 
between the blocks. Again, a balancing constraint demands |14| < (1 + e) flPl/fc] ■ However, there is 
no balancing constraint on the separator S. The objective is to minimize the size of the separator 
I S'!. Note that removing the set S from the graph results in at least two connected components and 
that the blocks Vi itself do not need to be connected components. By default, our initial inputs 
will have unit edge and node weights. However, the results in this paper are easily transferable to 
node and edge weighted problems. A matching M V E is a set of edges that do not share any 
common nodes, i.e. the graph (H, M) has maximum degree one. Contracting an edge {u, means 
to replace the nodes u and by a new node x connected to the former neighbors of u and v. We set 
c(x) = c{u) + c{v). If replacing edges of the form {u,w} , would generate two parallel edges 

{Xjtc}, we insert a single edge with uj{{x,w}) = u}{{u,w}) + a;({x, U)}). Uncontracting an edge e 
undos its contraction. In order to avoid tedious notation, G will denote the current state of the 
graph before and after a (un)contraction unless we explicitly want to refer to different states. 

The multilevel approach consists of three main phases. In the contraction (coarsening) phase, 
we iteratively identify matchings MCE and contract the edges in M. Contraction should quickly 
reduce the size of the input and each computed level should reflect the global structure of the input 
network. Contraction is stopped when the graph is small enough so that the problem can be solved 
by some other potentially more expensive algorithm. In the local search (or uncoarsening) phase, 
matchings are iteratively uncontracted. After uncontracting a matching, the local search algorithm 
moves nodes to decrease the size of the separator or to to improve balance of the block while keeping 
the size of the separator. The succession of movements is based on priorities called gain, i.e., the 
decrease in the size of the separator. The intuition behind the approach is that a good solution at 
one level of the hierarchy will also be a good solution on the next finer level so that local search will 
quickly find a good solution. 
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2.2 Related Work 


There has been a huge amount of research on graph partitioning so that we refer the reader to 
m for most of the material in this area. Here, we focus on issues closely related to our main 
contributions and previous work on the node separator problem. Lipton and Tarjan |22] provide the 
planar separator theorem stating that on planar graphs one can always find a separator S in linear 
time that satisfies 151 G 0{\/\V\) and \Vi\ < 2|l/|/3. For more balanced cases, the problem remains 
NP-hard [13] even on planar graphs. 

For general graphs there exist several heuristics to compute small node separators. A common 
and simple method is to derive a node separator from an edge separator |28l32j which is usually 
computed by a multilevel graph partitioning algorithm. Clearly, taking the boundary nodes of the 
edge separator in one block of the partition yields a node separator. Since one is interested in a small 
separator, one can use the smaller set of boundary nodes. A better method has been first described 
by Pothen and Fan [28] . The method employs the set of cut edges of the partition and computes the 
smallest node separator that can be found by using a subset of the boundary nodes. The main idea 
is to compute a subset S of the boundary nodes such that each cut edge is incident to at least one 
of the nodes in S (a vertex cover). A problem of the method is that the graph partitioning problem 
with edge cut as objective has a somewhat different combinatorial structure compared to the node 
separator problem. This makes it unlikely to find high quality solutions with that approach. 

Metis |19| and Scotch |26| use a multilevel approach to obtain a node separator. After contrac¬ 
tion, both tools compute a node separator on the coarsest graph using a greedy algorithm. This 
separator is then transferred level-by-level, dropping non-needed nodes on each level and applying 
Fiduccia-Mattheyses (FM) style local search. Previous versions of Metis and Scotch also included 
the capability to compute a node separator from an edge separator. 

Recently, Hamann and Strasser m presented a max-flow based algorithm specialized for road 
networks. Their main focus is not on node separators. They focus on a different formulation of 
the edge-cut version graph partitioning problem. More precisely, Hamann and Strasser find Pareto 
solutions in terms of edge cut versus balance instead of specifying the allowed amount of imbalance 
in advance and hnding the best solution satisfying the constraint. Their work also includes an 
algorithm to derive node separators, again in a different formulation of the problem, i.e. node 
separator size versus balance. We cannot make meaningful comparisions since the paper contains 
no data on separator quality and the implementation of the algorithm is not available. 

Hager et al. m recently proposed a multilevel approach for medium sized graphs using continu¬ 
ous bilinear quadratic programs and a combination of those with local search algorithms. However, a 
different formulation of the problem is investigated, i.e. the solver enforces upper and lower bounds 
to the block sizes which makes the results incomparable to our results. 

LaSalle and Karypis |20| present a shared-memory parallel algorithm to compute node separa¬ 
tors used to compute fill reducing orderings. Within a multilevel approach they evaluate different 
local search algorithms indicating that a combination of greedy local search with a segmented FM 
algorithm can outperform serial FM algorithms. We compare the solution quality of our algorithm 
against the data presented there in our experimental section (see Section]^. 
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3 Advanced Multilevel Algorithms for Node Separators 

We now present our core innovations. In brevity, the novelties of our algorithm include edge ratings 
during coarsening to compute graph hierarchies that fulfill the needs of the node separator problem 
and a combination of localized local search with flow problems to improve the size of the separa¬ 
tor. In addition, we transfer a concept called iterative multilevel scheme previously used in graph 
partitioning to further improve solution quality. The description of our algorithm in this section 
follows the multilevel scheme. We start with the description of the edge ratings that we use during 
coarsening, continue with the description of the algorithm used to compute an initial node separator 
on the coarsest level and then describe local search algorithms as well as other techniques. 

3.1 Coarsening 

Before we explain the matching algorithm that we use in our system, we present the general two- 
phase procedure which was already used in multiple graph partitioning frameworks |18l29l25j . The 
two-phase approach makes contraction more systematic by separating two issues: A rating function 
and a matching algorithm. A rating function indicates how much sense it makes to contract an 
edge based on local information. A matching algorithm tries to maximize the sum of the ratings of 
the contracted edges looking at the global structure of the graph. While the rating function allows 
a flexible characterization of what a “good” contracted graph is, the simple, standard definition of 
the matching problem allows to reuse previously developed algorithms for weighted matching. Note 
that we can use the same edge rating functions as in the graph partitioning case but also can define 
new ones since the problem structure of the node separator problem is different. 

We use the Global Path Algorithm (GPA) which runs in near linear time to compute matchings. 
GPA was proposed in |24| as a synthesis of the Greedy Algorithm and the Path Growing Algo¬ 
rithm [12]. We choose this algorithm since in |18| it gives empirically considerably better results 
than Sorted Heavy Edge Matching, Heavy Edge Matching or Random Matching |3T]. GPA scans 
the edges in order of decreasing weight but rather than immediately building a matching, it first 
constructs a collection of paths and even length cycles. Afterwards, optimal solutions are computed 
for each of these paths and cycles using dynamic programming. 

Edge Ratings for Node Separator Problems. We want to guide the contraction algorithm so that 
coarse levels in the graph hierarchy still contain small node separators if present in the input 
problem. This way we can provide a good starting point for the initial node separator routine. 
There are a lot of possibilities that we have tried. The most important edge rating functions for an 
edge e = {n, v} G E are the following: 

exp*(e) = u}{e) / {d{u)d{v)) 
exp**(e) = u;{ef/{d{u)d{v)) 

max(e) = 1/ max{d{u), d{v)} 
log(e) = 1/log{d{u)d{v)) 

The first two ratings have already been successfully used in the graph partitioning field. To give an 
intuition behind these ratings, we have to characterize the properties of “good” matchings for the 
purpose of contraction in a multilevel algorithm for the node separator problem. Our main objective 
is to find a small node separator on the coarsest graph. A matching should contain a large number 
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of edges, e.g. being maximal, so that there are only few levels in the hierarchy and the algorithm can 
converge qnickly. In order to represent the input on the coarser levels, we want to find matchings 
snch that the graph after contraction has somewhat uniform node weights and small node degrees. 
In addition, we want to keep nodes having a small degree since they are potentially good separators. 
Uniform node weights are also helpfnl to achieve a balanced node separator on coarser levels and 
makes local search algorithms more effective. We also inclnded ratings that do not contain the edge 
weight of the graph since intnitively a matching does not have to care abont large edge weights - 
they do not show np in the objective of the node separator problem. 

3.2 Initial Node Separators 

We stop coarsening as soon as the graph has less than ten thonsand nodes. Onr approach first 
compntes an edge separator and then derives a node separator from that. More precisely, we partition 
the coarsest graph into two blocks nsing KaFFPa m- We then look at the bipartite graph indnced 
by set of cnt edges inclnding the given node weights. Onr goal is to select a minimnm weight node 
separator in that graph. As a side note, this corresponds to finding a minimnm weight vertex cover 
in the bipartite graph. Also note that this is similar to the approach of Pothen et al. |28| . however 
we integrate node weights. To solve the problem, we pnt all of the nodes of the bipartite graph 
into the initial separator S and nse the flow-based technique defined below to select the smallest 
separator contained in that snbgraph. Since onr algorithms are randomized, we repeat the overall 
procednre twenty five times and pick the best node separator that we have fonnd. 

3.3 Local Search 

Localized Local Search. In graph partitioning it has been shown that higher localization of local 
search can improve solntion qnality |32I25) . Hence, we develop a novel localized algorithm for the 
node separator problem that starts local search only from a conple of selected separator nodes. Onr 
localized local search procednre is based on the FM scheme. Before we explain onr approach to 
localization, we present a commonly nsed FM-variant for completeness. 

For each of the two blocks Vi, V 2 nnder consideration, a priority qnene of separator nodes eligible 
to move is kept. The priority is based on the gain concept, i.e. the decrease in the objective fnnction 
valne when the separator node is moved into that block. More precisely, if a node v € S wonld be 
moved to Vi, then the neighbors of v that are in V 2 have to be moved into the separator. Hence, 
in this case the gain of the node is the weight of v minns the weight of the nodes that have to be 
added to the separator. The gain valne in the other case (moving v into to V 2 ) is similar. After the 
algorithm compnted both gain valnes it chooses the largest gain valne snch that moving the node 
does not violate the balance constraint and performs the movement. Each node is moved at most 
once ont of the separator within a single local search. The qnenes are initialized randomly with the 
separator nodes. After a node is moved, newly added separator nodes become eligible for movement 
(and hence are added to the priority qnenes). 

There are different possibilities to select a block to which a node shall be moved. The most 
common variant of the classical FM-algorithm alternates between both blocks. After a stopping 
criterion is applied, the best feasible node separator fonnd is reconstrncted (among ties choose the 
node separator that has better balance). We have two strategies to balance blocks. The first strategy 
tries to create a balanced sitnation withont increasing the size of the separator. It always selects the 
qnene of the heavier block and nses the same roll back mechanism as before. The second strategy 
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allows to increase the size of the node separator. It also selects a node from the qnene of the heavier 
block, bnt the roll back mechanism recreates the node separator having the best balance (among 
ties we choose the smaller node separator). 

Onr approach to localization works as follows. Previons local search methods were initialized 
with all separator nodes, i.e. all separator nodes are eligible for movement at the beginning. In 
contrast, onr method is repeatedly initialized only with a subset of the separator nodes (the precise 
amonnt of nodes in the snbset is a tnning parameter). Intnitively, this introdnces a larger amonnt 
of diversification and boosts the algorithms ability to climb ont of local minima. 

The algorithm is organized in ronnds. One ronnd works as follows. Instead of pntting all sep¬ 
arator nodes directly into the priority qnenes, we pnt the cnrrent separator nodes into a todo list 
T. Snbseqnently, we begin local search starting with a random subset S of the todo list T. We 
select the snbset S by repeatedly picking a random node v from T. We add n to 5 if it still is a 
separator node and has not been moved by a previons local search in that ronnd. Either way, v is 
removed from the todo list. Onr localized search is restricted to the movement of nodes that have 
not been tonched by a previons local search dnring the ronnd. This assnres that each node is moved 
at most once ont of the separator dnring a ronnd of the algorithm and avoids cyclic local search. 
By defanlt onr local search rontine first nses classic local search (inclnding balancing) to get close 
to a good solntion and afterwards nses localization to improve the resnlt fnrther. We repeat this 
nntil no fnrther improvement is fonnd. 

We now give intnition why localization of local search boosts the algorithms ability to climb ont 
of local minima. Consider a sitnation in which a node separator is a locally optimal in the sense that 
at least two node movements are necessary nntil moving a node ont of the separator with positive 
gain is possible. Recall that classical local search is initialized with all separator nodes (in this case 
all of them have negative gain valnes). It then starts to move nodes with negative gain at mnltiple 
places of the graph. When it finally moves nodes with positive gain the separator is already mnch 
worse than the inpnt node separator. Hence, the movement of these positive gain nodes does not 
yield an improvement with respect to the given inpnt partition. On the other hand, a localized local 
search that starts close to the nodes with positive gain, can find the positive gain nodes by moving 
only a small nnmber of nodes with negative gain. Since it did not move as many negative gain nodes 
as the classical local search, it may still finds an improvement with respect to the inpnt. 

Maximum Flows as Loeal Seareh. We define the node-capacitated flow problem F = (Vj-, Ejr) that 
we solve to improve a given node separator as follows. First we introdnce a few notations. Given a set 
of nodes H C H, we define its border dA := {u £ A \ 3{u, v) £ E : v ^ A}. The set diA := dA n Vi 
is called left border of A and the set 82 A := 8 A n V 2 is called right border of A. An A indueed flow 
problem F is the node indnced snbgraph G[A\ nsing 00 as edge-capacities and the node weights of 
the graph as node-capacities. Additionally there are two nodes s, t that are connected to the border 
of A. More precisely, s is connected to all left border nodes 81 A and all right border nodes 82 A are 
connected to t. These new edges get capacity 00 . Note that the additional edges are directed. F 
has the balanee property if each (s,t)-flow indnces a balanced node separator in G, i.e. the blocks Vi 
fnlfill the balancing constraint. The basic idea is to constrnct a flow problem F having the balance 
property. We now explain how we find snch a snbgraph. We start by setting A to S' and extend 
it by performing two breadth first searches (BFS). The first BFS is initialized with the cnrrent 
separator nodes S and only looks at nodes in block Vi. The same is done dnring the second BFS 
with the difference that we now look at nodes from block V 2 . Each node tonched by any of the BFS 
is added to A. The first BFS is stopped as soon as the size of the newly added nodes wonld exceed 


5 




Fig. 1. The construction of an A induced flow problem is shown. Two breadth first searches are started to define 
the area A - one into the block on the left hand side and one into the block on the right hand side. A solution of the 
flow problem yields the smallest node separator that can be found within the area. The area A is chosen so that each 
node separator that can be found in the area yields a feasible separator for the original problem. 


Lmax — c(V 2 ) — c(S). Similarly, the second BFS is stopped as soon as the size of the newly added 
nodes wonld exceed Lmax — ciVi) — c{S). 

A solntion of the A indnced flow problem yields a valid node separator of the original graph: 
First, since all edges in onr flow network have capacity oo and the separator S is contained in 
the problem, a maximnm flow yields a separator S', Vj^ = Vl U U S'^ in the flow network that 
separates s G V( from t G Since there is a one-to-one mapping between the nodes of onr flow 
problem and the nodes of the inpnt graph, we directly obtain a separator in the original network 
V = F]* U V 2 U S'. Additionally, the node separator compnted by onr method fnlhlls the balance 
constraint - presnming that the inpnt solntion is balanced. To see this, we consider the size of Fj*. 
We can bonnd the size of this block by assnming that all of the nodes that have been tonched by the 
second BFS get assigned to Fj* (inclnding the old separator S). However, in this case the balance 
constraint is still fnlhlled c(F]*) < c(Fi) -|- c{S) + Tmax — c(Fi) — c{S) = L m»x - The same holds for 
the opposite direction. Note that the separator is always smaller or eqnal to the inpnt separator 
since S is contained in the constrnction. 

To solve the node-capacitated flow problem T", we transform it into a flow problem % withont 
node-capacities. We nse a standard techniqne [T]: hrst we insert the sonrce and the sink into onr 
model. Then, for each node u in onr flow problem T" that is not the sonrce or the sink, we introdnce 
two nodes ui and U 2 in 1^ which are connected by a directed edge (mi,M 2 ) £ with an edge- 
capacity set to the node-capacity of the cnrrent node. For an edge {u, v) G Ej: not involving the 
sonrce or the sink, we insert (u 2 , fi) into with capacity 00 . If u is the sonrce s, we insert (s, ni) 
and if v is the sink, we insert (u 2 , t) into E-}{. In both cases we nse capacity 00 . 

Larger Flow Problems and Better Balanced Node Separators. The dehnition of the flow problem to 
improve a node separator reqnires that each cnt in the flow problem corresponds to a balanced node 
separator in the original graph. We now simplify this dehnition and stop the BFSs if the size of 
the tonched nodes exceeds (1 -|- a)L^ax — c(F) ~ c{S) with a > 0. We then solve the how problem 
and check afterwards if the corresponding node separator is balanced. If this is the case, we accept 
the node separator and continne. If this is not the case, we set a := al2 and repeat the procednre. 
After ten nnsnccessfni iterations, we set a = 0. Additionally, we stop the process if the how valne 
of the how problem corresponds to the separator weight of the inpnt separator. 

We apply henristics to extract a better balanced node separator from the solved max-how prob¬ 
lem. Picard and Qneyranne 1271 made the observation that one (s, t)-max-how contains information 
abont all minimnm (s,t)-cnts in the graph (however, hnding the most balanced minimnm cnt is 
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Fig. 2. Left: the set C = {a,d,e,f} is a closed node set since no edge is starting in C and ending in V\C. Right: 
using a reverse topological ordering of a DAG one can output multiple closed node sets. 


NP-hard 0). We follow the heuristic approach of |29] and extract better balanced (s,t)-cuts from 
the given maximum flow in %. This results in better balanced separators in the node-capacitated 
problem T and hence in better balanced node separators for our original problem. 

To be more precise, Picard and Queyranne have shown that each closed node set in the residual 
graph of a maximum (s, t)-flow that contains the source s but not the sink induces a minimum s-t 
cut. Observe that a cycle in the residual graph cannot contain a node of both, a closed node set 
and its complement. Hence, Picard and Queyranne compactify the residual network by contracting 
all strongly connected components. Afterwards, their algorithm tries to find the most balanced 
minimum cut by enumeration. In |29) . we find better balanced cuts heuristically. First a random 
topological order of the strongly connected component graph is computed. This is then scanned in 
reverse order. By subsequently adding strongly connected components several closed node sets are 
obtained, each inducing a minimum s-t cut. The closed node set with the best occurred balance 
among multiple runs of the algorithm with different random topological orders is returned. An 
example closed node set and the scanning algorithm is shown in Figure]^ 

3.4 Miscellanea 

An easy way to obtain high quality node separators is to use a multilevel algorithm multiple times 
using different random seeds and use the best node separator that has been found. However, instead 
of performing a full restart, one can use the information that has already been obtained. In the graph 
partitioning context, the notion of iterated multilevel schemes has been introduced by Walshaw |35) 
and later has been augmented to more complex cycles |29) . Here, one transfers a solution of a 
previous multilevel cycle down the hierarchy and uses it as initial solution. More precisely, this can 
be done by not contracting any cut edge. 

We transfer this technique to the node separator problem as follows. One can interpret a node 
separator as a three way partition Fi, V 2 , S'. Hence, to obtain an iterated multilevel scheme for the 
node separator problem, our matching algorithm is not allowed to match any edge that runs between 
Vi and S {i = 1,2). Hence, when contraction is done, every edge leaving the separator will remain 
and we can transfer the node separator down in the hierarchy. Thus a given node separator can be 
used as initial node separator of the coarsest graph (having the same balance and size as the node 
separator of the finest graph). This ensures non-decreasing quality, if the local search algorithm 
guarantees no worsening. To increase diversification during coarsening in later V-cycles we pick a 
random edge rating of the ones described above. 
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4 Experiments 


Methodology. We have implemented the algorithm described above within the KaHIP framework 
using C++ and compiled all algorithms using gcc 4.63 with full optimization’s turned on (-03 
flag). We integrated our algorithms in KaHIP v0.71 and compare ourselves against Metis 5.1 and 
Scotch 6.0.4 using the quality option that has focus on solution quality instead of running time. 
Our new codes will be included into the KaHIP graph partitioning framework. We perform ten 
repetitions of each algorithm using different random seeds for initialization. Each run was made on 
a machine that has four Octa-Core Intel Xeon E5-4640 processors running at 2.4 GHz. It has 512 
GB local memory, 20 MB L3-Cache and 8x256 KB L2-Cache. Our main objective is the cardinality 
of node separators on the input graph. In our experiments, we use e = 20% since this is the default 
value for node separators in Metis. We mostly present two kinds of views on the data: average values 
and minimum values as well as plots that show the ratios of the quality achieved by the algorithms. 

Algorithm Configuration. We performed a number of experiments to evaluate the influence and 
choose the parameters of our algorithms. We mark the instances that have also been used for 
the parameter tuning in Appendix with a * and exclude these graphs when we report average 
values over multiple instances in comparisons with our competitors. However, our full algorithm 
is not too sensitive about the precise choice with most of the parameters. In general, using more 
sophisticated edge ratings improves solution quality slightly and improves partitioning speed over 
using edge weight. We exclude further experiments from the main text and use the exp* edge rating 
function as a default since it has a slight advantage in our preliminary experiments. In later iterated 
multilevel cycles, we pick one of the other ratings at random to introduce more diversification. 
Indeed, increasing the number of V-cycles reduces the objective function. We fixed the number 
of V-cycles to three. By default, we use the better balanced minimum cut heuristic in our node 
separator algorithm since it keeps the node separator cardinality and improves balance. In the 
localized local search algorithm, we set the size of the random subset of separator nodes from which 
local search is started |5| to five. 

Instances. We use graphs from various sources to test our algorithm. We use all 34 graphs from Ghris 
Walshaw’s benchmark archive |3l]. Graphs derived from sparse matrices have been taken from the 
Florida Sparse Matrix Gollection |8]. We also use graphs from the 10th DIMAGS Implementation 
Ghallenge [2] website. Here, rggX is a random geometric graph with 2^ nodes where nodes represent 
random points in the unit square and edges connect nodes whose Euclidean distance is below 
0.55-^lnn/n. The graph delX is a Delaunay triangulation of 2^ random points in the unit square. 
The graphs af _shell9, thermal2, nlr and nlpkkt240 are from the matrix and the numeric section 
of the DIMAGS benchmark set. The graphs europe and deu are large road networks of Europe 
and Germany taken from [10]. Due to large running time of our algorithm, we exclude the graph 
nlpkkt240 from general comparisons and only use our full algorithm to compute a result. Basic 
properties of the graphs under consideration can be found in Appendix Table 



4.1 Separator Quality 


We now assess the size of node separators derived by 
our algorithms and by other state-of-the-art tools, i.e. 

Metis and Scotch as well as the data recently presented 
by LaSalle and Karypis (20]. We use multiple configu¬ 
rations of our algorithm to estimate the influence of 
the multiplicative factor a that controls the size of 
the flow problems solved during uncoarsening and to 
see the effect of adding local search. The algorithms 
named Flow^ use only flows during uncoarsening as lo¬ 
cal search with a multiplicative factor a. Algorithms 
labeled LSFIoWq start on each level with local search 
and localized local search until no improvement is found 
and afterwards perform flow based local search with a 
multiplicative factor a. Table summarizes the results 
of the experiments. We present detailed per instances 
results in Appendix Table (separator size and bal¬ 
ance) and Table (running times). 

We now summarize the results. First of all, only using flow-based local search during uncoars¬ 
ening is already highly competitive, even for small flow problems with a = 0. On average, FIowq 
computes 6.7% smaller separators than Metis and 57% than Scotch. It computes a smaller or equally 
sized separator than Metis in 89% of the cases and than Scotch in every case. However, it also needs 
more time to compute a result. This is due to the large flow problems that have to be solved. Indeed, 
increasing the value of a, i.e. searching for separators in larger areas around the initial separator, 
improves the objective further at the cost of running time. For example, increasing a to 0.5 reduces 
the average size of the computed separator by 3.2%, but also increases the running time by more 
than a factor 2 on average. Using even larger values of a > 1 did not further improve the result 
so that we do not include the data here. Adding non-flow-based local search also helps to improve 
the size of the separator. For example, it improves the separator size by 1.8% when using a = 0. 
However, the impact of non-flow-based local search decreases for larger values of a. 

The strongest configuration of our algorithm is LSFlowi. It computes smaller or equally sized 
separators than Metis in all but two cases and than Scotch in every case. On average, separators 
are 10.3% smaller than the separators computed by Metis and 62.2% than the ones computed by 
Scotch. Figure shows the average improvement ratios over Metis and Scotch on a per instance 


Algorithm 

Avg. Inc. 

tavg[s] 

^ Metis 

Metis 

10.3% 

0.12 

- 

Scotch 

62.2% 

0.23 

0% 

FIowq 

3.3% 

17.72 

89% 

FIowo.5 

0.1% 

38.21 

96% 

Flowi 

0.3% 

47.81 

94% 

LSFIowq 

1.5% 

28.61 

96% 

LSFIowq.s 

-0.1% 

49.08 

94% 

LSFlowi 

- 

58.50 

96% 


Table 1. Avg. increase in separator size over 
LSFlowi , avg. running times of the different al¬ 
gorithms and relative number of instances with a 
separator smaller or equal to Metis < Metis). 


o 





Fig. 3. Improvement of LSFlowi per instance over Metis (left) and Scotch (right) sorted by absolute value of ratio. 
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basis, sorted by absolute value of improvement. The largest improvement over Metis was obtained on 
the road network europe where our separator is a factor 2.3 smaller whereas the largest improvement 
over Scotch is on add32 where our separator is a factor 12 smaller. On the instance G2_circuit Metis 
computes a 19.9% smaller separator which is the largest improvement of Metis over our algorithm. 

We now compare the size of our separators against the recently published data by LaSalle 
and Karypis |20) . The networks used therein that are publicly available are auto, nlr, del24 and 
nlpkkt240. On these graphs our strongest configuration computes separators that are 10.7%, 10.0%, 
20.1% and 27.1% smaller than their best configuration (Greedy+Segmented FM), respectively. 

5 Conclusion 

In this work, we derived algorithms to find small node separators in large graphs. We presented 
a multilevel algorithm that employs novel flow-based local search algorithms and transferred tech¬ 
niques successfully used in the graph partitioning field to the node separator problem. This includes 
the usage of edge ratings tailored to our problem to guide the graph coarsening algorithm as well as 
highly localized local search and iterated multilevel cycles to improve solution quality even further. 
Experiments indicate that using flow-based local search algorithms as only local search algorithm 
in a multilevel framework is already highly competitive in terms of separator quality. 

Important future work includes shared-memory parallelization of our algorithms, e.g. currently 
most of the running time in our algorithm is consumed by the max-flow solver so that a parallel solver 
will speed up computations. In addition, it is possible to define a simple evolutionary algorithm 
for the node separator problem by transferring the iterated multilevel scheme to multiple input 
separators. This will likely result in even better solutions. 
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A 


Benchmark Set 


Graph 

n 

m 1 

1 Graph 

n 

m 

Small Walshaw Graphs I 

1 UF Graphs 

add20 

2 395 

7462 

cop20k_A* 

99 843 

1 262 244 

data 

2 851 

15 093 

2cubes sphere* 

101492 

772 886 

3elt 

4 720 

13 722 

thermomech TG 

102 158 

304 700 

uk 

4 824 

6 837 

cfd2 

123 440 

1 482 229 

add32 

4 960 

9 462 

boneSOl 

127224 

3 293 964 

bcsstk33 

8 738 

291 583 

Dubcova3 

146 689 

1 744 980 

whitaker3 

9 800 

28 989 

bmwcra 1 

148 770 

5 247616 

crack 

10 240 

30 380 

G2 circuit 

150 102 

288286 

wing nodal* 

10 937 

75 488 

c-73 

169 422 

554 926 

fe 4elt2 

11143 

32 818 

shipsec5 

179860 

4 966 618 

vibrobox 

12 328 

165 250 

cont-300 

180 895 

448 799 

bcsstk29* 

13 992 

302 748 

1 Large Walshaw Graphs 

4elt 

15 606 

45 878 

598a 

110 971 

741 934 

fe sphere 

16 386 

49 152 

fe ocean 

143437 

409593 

cti 

16 840 

48 232 

144 

144 649 

1074 393 

memplus 

17 758 

54196 

wave 

156 317 

1059 331 

cs4 

22 499 

43 858 

ml4b 

214 765 

1679 018 

bcsstk30 

28 924 

1 007 284 

auto 

448 695 

3314611 

bcsstk31 

35 588 

572 914 

1 Large Other Graphs 

fe pwt 

36 519 

144 794 

del23 

«8.4M 

«25.2M 

bcsstk32 

44 609 

985 046 

del24 

«16.7M 

«50.3M 

fe body 

45 087 

163 734 

rgg23 

«8.4M 

«63.5M 

teok* 

60 005 

89 440 

rgg24 

«16.7M 

«132.6M 

wing 

62 032 

121544 

deu 

«4.4M 

«5.5M 

brack2 

62 631 

366 559 

eur 

«18.0M 

«22.2M 

finan512* 

74 752 

261120 

af shell9 

«504K 

«8.5M 

fe tooth 

78136 

452 591 

thermal2 

«1.2M 

«3.7M 

fe rotor 

99 617 

662 431 

nlr 

«4.2M 

«12.5M 




nlpkkt240 

«27.9M 

«373M 


Table 2. Basic properties of the instances used for evaluation. 


B Detailed per Instance Results 
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Graph 

Avg. 

Metis 

Best 

Bal. 

Avg. 

Scotch 

Best 

Bal. 

Avg. 

LSFIowq 

Best Bal. 

Avg. 

LSFIowq 

Best 

.5 

Bal. 

Avg. 

LSFlowi 

Best Bal. 

Avg. 

FIowq 

Best 

Bal. 

Avg. 

FIowq 

Best 

.5 

Bal. 

Avg. 

Flowi 

Best 

Bal. 

144 

1 539 

1 511 

1.13 

1639 

1602 

1.00 

1 482 

1467 

1.12 

1 444 

1 437 

1.19 

1 445 

1439 

1.19 

1495 

1 481 

1.09 

1 444 

1437 

1.20 

1 446 

1 437 

1.19 

2cubes_sphere 

1 398 

1335 

1.11 

1587 

1530 

1.00 

1 265 

1 245 

1.14 

1 228 

1 221 

1.19 

1 230 

1 221 

1.19 

1 274 

1266 

1.11 

1 237 

1 221 

1.18 

1235 

1 221 

1.18 

3elt 

42 

42 

1.09 

50 

46 

1.00 

42 

42 

1.11 

42 

42 

1.11 

42 

42 

1.11 

42 

42 

1.11 

42 

42 

1.11 

42 

42 

1.11 

4elt 

69 

68 

1.02 

82 

73 

1.00 

68 

68 

1.01 

68 

68 

1.01 

68 

68 

1.01 

68 

68 

1.02 

68 

68 

1.01 

68 

68 

1.01 

598a 

615 

603 

1.03 

639 

629 

1.00 

594 

593 

1.04 

593 

593 

1.03 

593 

593 

1.03 

594 

593 

1.04 

593 

593 

1.03 

593 

593 

1.03 

add20 

25 

23 

1.09 

142 

128 

1.10 

26 

23 

1.11 

23 

23 

1.08 

24 

23 

1.08 

28 

23 

1.10 

23 

23 

1.08 

24 

23 

1.08 

adds 2 

1 

1 

1.08 

14 

4 

1.00 

1 

1 

1.12 

1 

1 

1.12 

1 

1 

1.12 

1 

1 

1.12 

1 

1 

1.12 

1 

1 

1.12 

af_shell9 

934 

885 

1.00 

1382 

1095 

1.00 

880 

880 

1.06 

880 

880 

1.06 

880 

880 

1.06 

880 

880 

1.06 

880 

880 

1.06 

880 

880 

1.06 

auto 

2 109 

2 073 

1.18 

3158 

2 547 

1.00 

2 034 

2 021 

1.19 

1986 

1977 

1.20 

1992 

1978 

1.20 

2 093 

2 062 

1.17 

1992 

1981 

1.20 

1988 

1978 

1.20 

bcsstk29 

180 

180 

1.00 

260 

234 

1.01 

180 

180 

1.02 

180 

180 

1.11 

180 

180 

1.11 

180 

180 

1.01 

180 

180 

1.11 

180 

180 

1.10 

bcsstkSO 

208 

206 

1.04 

439 

393 

1.02 

206 

206 

1.00 

206 

206 

1.00 

206 

206 

1.00 

206 

206 

1.00 

206 

206 

1.00 

206 

206 

1.00 

bcsstkSl 

298 

285 

1.07 

482 

437 

1.04 

271 

268 

1.10 

268 

268 

1.17 

268 

268 

1.17 

271 

270 

1.09 

268 

268 

1.17 

268 

268 

1.17 

bcsstk32 

276 

252 

1.19 

752 

463 

1.04 

236 

229 

1.19 

239 

229 

1.18 

232 

229 

1.20 

252 

239 

1.17 

239 

229 

1.18 

233 

229 

1.20 

bcsstk33 

421 

421 

0.96 

549 

179 

1.21 

282 

262 

1.18 

267 

261 

1.20 

283 

265 

1.19 

292 

274 

1.17 

272 

266 

1.20 

288 

266 

1.19 

bmwcra.l 

318 

318 

1.13 

1006 

576 

1.06 

318 

318 

1.14 

350 

318 

1.13 

350 

318 

1.13 

318 

318 

1.14 

350 

318 

1.13 

350 

318 

1.13 

boneSOl 

1 583 

1 542 

1.08 

4 137 

3 969 

1.00 

1 525 

1 500 

1.04 

1500 

1500 

1.10 

1 500 

1 500 

1.10 

1 524 

1500 

1.04 

1 500 

1 500 

1.10 

1500 

1500 

1.10 

brack2 

182 

181 

1.07 

237 

214 

1.00 

181 

181 

1.07 

181 

181 

1.07 

181 

181 

1.07 

181 

181 

1.07 

181 

181 

1.07 

181 

181 

1.07 

cfd2 

1040 

1030 

1.05 

1303 

1163 

1.00 

1030 

1030 

1.06 

1030 

1030 

1.09 

1030 

1030 

1.08 

1030 

1030 

1.06 

1030 

1030 

1.08 

1030 

1030 

1.07 

cont-300 

598 

598 

1.00 

616 

598 

1.00 

598 

598 

1.00 

598 

598 

1.00 

579 

534 

1.06 

598 

598 

1.02 

598 

598 

1.18 

598 

598 

1.18 

cop20k_A 

680 

660 

1.02 

1904 

1833 

1.00 

613 

613 

1.04 

613 

613 

1.04 

613 

613 

1.04 

613 

613 

1.04 

613 

613 

1.04 

613 

613 

1.04 

crack 

72 

69 

1.08 

92 

81 

1.00 

69 

68 

1.13 

68 

68 

1.16 

68 

68 

1.16 

69 

68 

1.13 

68 

68 

1.16 

68 

68 

1.16 

cs4 

289 

281 

1.11 

332 

323 

1.00 

281 

279 

1.09 

267 

264 

1.19 

268 

264 

1.19 

284 

282 

1.08 

267 

265 

1.19 

269 

265 

1.18 

cti 

268 

266 

1.00 

291 

283 

1.00 

267 

266 

0.99 

266 

266 

0.98 

266 

266 

0.98 

267 

266 

1.01 

266 

266 

1.00 

266 

266 

1.00 

data 

59 

45 

1.10 

69 

64 

1.00 

44 

41 

1.17 

42 

41 

1.18 

43 

41 

1.18 

45 

43 

1.15 

42 

41 

1.17 

43 

41 

1.18 

del23 

2 486 

2 434 

1.03 

2 933 

2 741 

1.00 

2 050 

2 048 

1.01 

2 048 

2 048 

1.05 

2 048 

2 048 

1.04 

2 050 

2 048 

1.01 

2 048 

2 048 

1.04 

2 048 

2 048 

1.04 

del24 

3 541 

3 472 

1.01 

4 004 

3 792 

1.00 

2 908 

2 904 

1.01 

2 907 

2 904 

1.03 

2 907 

2 904 

1.03 

2 908 

2 904 

1.01 

2 907 

2 904 

1.03 

2 907 

2 904 

1.03 

deu 

241 

217 

1.07 

325 

286 

1.00 

152 

152 

1.04 

145 

145 

1.12 

145 

145 

1.12 

152 

152 

1.04 

145 

145 

1.12 

145 

145 

1.12 

Dubcova3 

406 

383 

1.02 

1495 

1395 

1.00 

383 

383 

1.04 

383 

383 

1.16 

383 

383 

1.15 

383 

383 

1.05 

383 

383 

1.16 

383 

383 

1.18 

eur 

430 

349 

1.09 

620 

486 

1.01 

218 

109 

1.07 

208 

200 

1.12 

206 

195 

1.13 

218 

109 

1.07 

208 

200 

1.12 

206 

195 

1.13 

fe.4elt2 

66 

66 

0.99 

69 

67 

1.00 

66 

66 

0.99 

66 

66 

0.99 

66 

66 

0.99 

66 

66 

1.02 

66 

66 

1.04 

66 

66 

1.04 

f e_body 

86 

65 

1.11 

160 

122 

1.01 

78 

66 

1.12 

77 

61 

1.15 

75 

62 

1.14 

78 

66 

1.12 

77 

61 

1.15 

75 

62 

1.14 

fe.ocean 

273 

263 

1.01 

340 

322 

1.00 

263 

263 

1.02 

263 

263 

1.02 

263 

263 

1.02 

263 

263 

1.02 

263 

263 

1.02 

263 

263 

1.02 

fe_put 

120 

120 

1.01 

132 

124 

1.00 

116 

116 

1.03 

116 

116 

1.09 

116 

116 

1.12 

116 

116 

1.03 

116 

116 

1.13 

116 

116 

1.13 

fe.rotor 

453 

441 

1.04 

576 

514 

1.05 

441 

439 

1.07 

441 

439 

1.08 

441 

439 

1.07 

441 

439 

1.08 

442 

439 

1.08 

442 

439 

1.08 

fe.sphere 

195 

192 

0.99 

239 

227 

1.00 

192 

192 

1.04 

192 

192 

1.05 

192 

192 

1.05 

192 

192 

1.02 

192 

192 

1.13 

192 

192 

1.14 

fe.tooth 

882 

867 

1.16 

1192 

1094 

1.00 

882 

869 

1.13 

849 

837 

1.19 

848 

826 

1.19 

885 

882 

1.11 

852 

827 

1.19 

853 

839 

1.19 

f incin512 

50 

50 

1.07 

67 

51 

1.02 

50 

50 

1.01 

50 

50 

1.13 

50 

50 

1.13 

50 

50 

1.01 

50 

50 

1.12 

50 

50 

1.13 

G2_circuit 

312 

312 

1.03 

416 

348 

1.00 

374 

312 

1.01 

374 

312 

1.03 

374 

312 

1.03 

374 

312 

1.02 

374 

312 

1.14 

374 

312 

1.14 

ml4b 

885 

859 

1.04 

895 

870 

1.00 

835 

834 

1.02 

834 

834 

1.00 

834 

834 

1.00 

835 

834 

1.02 

834 

834 

1.00 

834 

834 

1.00 

meraplus 

88 

81 

1.19 

95 

95 

1.00 

81 

72 

1.15 

66 

62 

1.15 

68 

65 

1.15 

108 

76 

1.10 

70 

65 

1.12 

72 
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1.19 

62 

62 
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Table 3. Detailed per instances results as average and best values for the size of separator and average balance. 
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Table 4. Detailed per instances results as average running time. 
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